{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "013ba85f-338c-4884-8056-a34d7f2c4949",
   "metadata": {},
   "source": [
    "### Searching the ChEMBL Database for Interesting SAR \n",
    "<img src=\"https://raw.githubusercontent.com/PatWalters/practical_cheminformatics_posts/refs/heads/main/chembl_herg/sherlock_robot.png\" width=300/>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "060fd28c-2c14-4714-857b-b3e77b098333",
   "metadata": {},
   "source": [
    "### 0. Introduction\n",
    "This analysis is based on a conversation I had with some friends working on the [OpenADMET](https://openadmet.org/) project. One goal of OpenADMET is to identify the structures of ligands bound to off-targets like hERG, an ion channel involved in cardiac liabilities.  Those interested in learning more about strategies for hERG optimization may want to consult [this 2006 paper](https://pubs.acs.org/doi/full/10.1021/jm060379l) from J. Med. Chem. or [this video](https://www.youtube.com/watch?v=-GBhvMpwJl0) from Drug Hunter. \n",
    "\n",
    "I thought it would be interesting to find examples in the medicinal chemistry literature where a hERG liability was addressed. I aimed to find congeneric series with a range of hERG activities. Having series that include hERG binders and closely related non-binders can provide insights into strategies for reducing liabilities. If we obtain a cryoEM structure of the hERG binder, can we use it to explain why the inactive analog lacks hERG activity? To explore this, I searched ChEMBL for cases where a team tackled a hERG liability. I conducted a search to identify sets of similar compounds from the same paper with at least a 10-fold difference in hERG activity. This notebook details the steps I took in this process."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a9279def9f250f3",
   "metadata": {},
   "source": [
    "Install the necessary Python packages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "55c1f2512a7c3e7f",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-07-05T11:41:09.578100Z",
     "start_time": "2025-07-05T11:41:09.576019Z"
    }
   },
   "outputs": [],
   "source": [
    "%%capture\n",
    "import sys\n",
    "IN_COLAB = 'google.colab' in sys.modules\n",
    "if IN_COLAB:\n",
    "    !pip install -q rdkit mols2grid ipywidgets tqdm useful_rdkit_utils molbloom"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8e61930-db96-4cb7-b821-86f4bff4a6b5",
   "metadata": {},
   "source": [
    "Load the necessary Python libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "554f9bcf-ee4e-4bc0-a126-a0e49543b60f",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-07-05T14:15:15.503982Z",
     "start_time": "2025-07-05T14:15:15.501168Z"
    }
   },
   "outputs": [],
   "source": [
    "import sqlite3\n",
    "import pandas as pd\n",
    "from tqdm.auto import tqdm\n",
    "import useful_rdkit_utils as uru\n",
    "import numpy as np\n",
    "from rdkit import Chem\n",
    "from molbloom import buy\n",
    "import mols2grid\n",
    "from ipywidgets import interact\n",
    "import os"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "654d7f76-7f9f-48af-b55c-abf30169daf1",
   "metadata": {},
   "source": [
    "Enable progress bars for Pandas operations "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "9e0fe383-920d-41fd-b186-a978fea8948a",
   "metadata": {},
   "outputs": [],
   "source": [
    "tqdm.pandas()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55238265-bc8a-406c-9f52-ced0b54ee7d5",
   "metadata": {},
   "source": [
    "### 1. Download the ChEMBL database \n",
    "If in Google Colab download and unzip the ChEMBL database.  This takes about 3.5 minutes. If you're running this notebook locally, change the `db_filename` to reflect the location of your ChEMBL sqlite database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "6e0921eb-6e42-4998-bd7b-537f6087abac",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-07-05T14:14:23.991354Z",
     "start_time": "2025-07-05T14:14:23.985530Z"
    }
   },
   "outputs": [],
   "source": [
    "db_filename = \"/Users/pwalters/.data/chembl/35/chembl_35.db\"\n",
    "if IN_COLAB:\n",
    "    db_filename = \"./chembl_35/chembl_35_sqlite/chembl_35.db\"\n",
    "    if not os.path.exists(db_filename):\n",
    "        !wget https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/chembl_35_sqlite.tar.gz\n",
    "        !tar -xzf chembl_35_sqlite.tar.gz"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e28b145-3a0b-4d5b-8ce5-830bf7b6be42",
   "metadata": {},
   "source": [
    "### 2. Find the ChEMBL ID for hERG"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2d31da24-fafe-4292-8878-9cdfd4b75765",
   "metadata": {},
   "source": [
    "To begin, we know that the Uniprot ID for hERG is [Q12809](https://www.uniprot.org/uniprotkb/Q12809/entry). To search ChEMBL for compounds tested against hERG, we need to convert that Uniprot ID to a ChEMBL ID. There is a file called `chembl_unitprot_mapping.txt` available on the ChEMBL download site that maps ChEMBL IDs to Uniprot IDs. Let's load that file into a Pandas DataFrame so we can search it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "40ac1695-647f-431f-9b60-441cbcce8b3d",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-07-05T14:14:28.496033Z",
     "start_time": "2025-07-05T14:14:27.429679Z"
    }
   },
   "outputs": [
    {
     "ename": "URLError",
     "evalue": "<urlopen error [Errno 8] nodename nor servname provided, or not known>",
     "output_type": "error",
     "traceback": [
      "\u001b[31m---------------------------------------------------------------------------\u001b[39m",
      "\u001b[31mgaierror\u001b[39m                                  Traceback (most recent call last)",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/urllib/request.py:1348\u001b[39m, in \u001b[36mAbstractHTTPHandler.do_open\u001b[39m\u001b[34m(self, http_class, req, **http_conn_args)\u001b[39m\n\u001b[32m   1347\u001b[39m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[32m-> \u001b[39m\u001b[32m1348\u001b[39m     \u001b[43mh\u001b[49m\u001b[43m.\u001b[49m\u001b[43mrequest\u001b[49m\u001b[43m(\u001b[49m\u001b[43mreq\u001b[49m\u001b[43m.\u001b[49m\u001b[43mget_method\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mreq\u001b[49m\u001b[43m.\u001b[49m\u001b[43mselector\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mreq\u001b[49m\u001b[43m.\u001b[49m\u001b[43mdata\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mheaders\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m   1349\u001b[39m \u001b[43m              \u001b[49m\u001b[43mencode_chunked\u001b[49m\u001b[43m=\u001b[49m\u001b[43mreq\u001b[49m\u001b[43m.\u001b[49m\u001b[43mhas_header\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m'\u001b[39;49m\u001b[33;43mTransfer-encoding\u001b[39;49m\u001b[33;43m'\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m   1350\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mOSError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m err: \u001b[38;5;66;03m# timeout error\u001b[39;00m\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/http/client.py:1303\u001b[39m, in \u001b[36mHTTPConnection.request\u001b[39m\u001b[34m(self, method, url, body, headers, encode_chunked)\u001b[39m\n\u001b[32m   1302\u001b[39m \u001b[38;5;250m\u001b[39m\u001b[33;03m\"\"\"Send a complete request to the server.\"\"\"\u001b[39;00m\n\u001b[32m-> \u001b[39m\u001b[32m1303\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_send_request\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmethod\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43murl\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbody\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mheaders\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mencode_chunked\u001b[49m\u001b[43m)\u001b[49m\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/http/client.py:1349\u001b[39m, in \u001b[36mHTTPConnection._send_request\u001b[39m\u001b[34m(self, method, url, body, headers, encode_chunked)\u001b[39m\n\u001b[32m   1348\u001b[39m     body = _encode(body, \u001b[33m'\u001b[39m\u001b[33mbody\u001b[39m\u001b[33m'\u001b[39m)\n\u001b[32m-> \u001b[39m\u001b[32m1349\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mendheaders\u001b[49m\u001b[43m(\u001b[49m\u001b[43mbody\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mencode_chunked\u001b[49m\u001b[43m=\u001b[49m\u001b[43mencode_chunked\u001b[49m\u001b[43m)\u001b[49m\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/http/client.py:1298\u001b[39m, in \u001b[36mHTTPConnection.endheaders\u001b[39m\u001b[34m(self, message_body, encode_chunked)\u001b[39m\n\u001b[32m   1297\u001b[39m     \u001b[38;5;28;01mraise\u001b[39;00m CannotSendHeader()\n\u001b[32m-> \u001b[39m\u001b[32m1298\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_send_output\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmessage_body\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mencode_chunked\u001b[49m\u001b[43m=\u001b[49m\u001b[43mencode_chunked\u001b[49m\u001b[43m)\u001b[49m\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/http/client.py:1058\u001b[39m, in \u001b[36mHTTPConnection._send_output\u001b[39m\u001b[34m(self, message_body, encode_chunked)\u001b[39m\n\u001b[32m   1057\u001b[39m \u001b[38;5;28;01mdel\u001b[39;00m \u001b[38;5;28mself\u001b[39m._buffer[:]\n\u001b[32m-> \u001b[39m\u001b[32m1058\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43msend\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmsg\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m   1060\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m message_body \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[32m   1061\u001b[39m \n\u001b[32m   1062\u001b[39m     \u001b[38;5;66;03m# create a consistent interface to message_body\u001b[39;00m\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/http/client.py:996\u001b[39m, in \u001b[36mHTTPConnection.send\u001b[39m\u001b[34m(self, data)\u001b[39m\n\u001b[32m    995\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m.auto_open:\n\u001b[32m--> \u001b[39m\u001b[32m996\u001b[39m     \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mconnect\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m    997\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/http/client.py:1468\u001b[39m, in \u001b[36mHTTPSConnection.connect\u001b[39m\u001b[34m(self)\u001b[39m\n\u001b[32m   1466\u001b[39m \u001b[33m\"\u001b[39m\u001b[33mConnect to a host on a given (SSL) port.\u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m-> \u001b[39m\u001b[32m1468\u001b[39m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m.\u001b[49m\u001b[43mconnect\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m   1470\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m._tunnel_host:\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/http/client.py:962\u001b[39m, in \u001b[36mHTTPConnection.connect\u001b[39m\u001b[34m(self)\u001b[39m\n\u001b[32m    961\u001b[39m sys.audit(\u001b[33m\"\u001b[39m\u001b[33mhttp.client.connect\u001b[39m\u001b[33m\"\u001b[39m, \u001b[38;5;28mself\u001b[39m, \u001b[38;5;28mself\u001b[39m.host, \u001b[38;5;28mself\u001b[39m.port)\n\u001b[32m--> \u001b[39m\u001b[32m962\u001b[39m \u001b[38;5;28mself\u001b[39m.sock = \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_create_connection\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m    963\u001b[39m \u001b[43m    \u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mhost\u001b[49m\u001b[43m,\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mport\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mtimeout\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43msource_address\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m    964\u001b[39m \u001b[38;5;66;03m# Might fail in OSs that don't implement TCP_NODELAY\u001b[39;00m\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/socket.py:839\u001b[39m, in \u001b[36mcreate_connection\u001b[39m\u001b[34m(address, timeout, source_address, all_errors)\u001b[39m\n\u001b[32m    838\u001b[39m exceptions = []\n\u001b[32m--> \u001b[39m\u001b[32m839\u001b[39m \u001b[38;5;28;01mfor\u001b[39;00m res \u001b[38;5;129;01min\u001b[39;00m \u001b[43mgetaddrinfo\u001b[49m\u001b[43m(\u001b[49m\u001b[43mhost\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mport\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[32;43m0\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mSOCK_STREAM\u001b[49m\u001b[43m)\u001b[49m:\n\u001b[32m    840\u001b[39m     af, socktype, proto, canonname, sa = res\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/socket.py:974\u001b[39m, in \u001b[36mgetaddrinfo\u001b[39m\u001b[34m(host, port, family, type, proto, flags)\u001b[39m\n\u001b[32m    973\u001b[39m addrlist = []\n\u001b[32m--> \u001b[39m\u001b[32m974\u001b[39m \u001b[38;5;28;01mfor\u001b[39;00m res \u001b[38;5;129;01min\u001b[39;00m \u001b[43m_socket\u001b[49m\u001b[43m.\u001b[49m\u001b[43mgetaddrinfo\u001b[49m\u001b[43m(\u001b[49m\u001b[43mhost\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mport\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mfamily\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mtype\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mproto\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mflags\u001b[49m\u001b[43m)\u001b[49m:\n\u001b[32m    975\u001b[39m     af, socktype, proto, canonname, sa = res\n",
      "\u001b[31mgaierror\u001b[39m: [Errno 8] nodename nor servname provided, or not known",
      "\nDuring handling of the above exception, another exception occurred:\n",
      "\u001b[31mURLError\u001b[39m                                  Traceback (most recent call last)",
      "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[23]\u001b[39m\u001b[32m, line 1\u001b[39m\n\u001b[32m----> \u001b[39m\u001b[32m1\u001b[39m df = \u001b[43mpd\u001b[49m\u001b[43m.\u001b[49m\u001b[43mread_csv\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mhttps://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/chembl_uniprot_mapping.txt\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43msep\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[38;5;130;43;01m\\t\u001b[39;49;00m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[32m      2\u001b[39m \u001b[43m                \u001b[49m\u001b[43mcomment\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m'\u001b[39;49m\u001b[33;43m#\u001b[39;49m\u001b[33;43m'\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[32m      3\u001b[39m \u001b[43m                \u001b[49m\u001b[43mnames\u001b[49m\u001b[43m=\u001b[49m\u001b[43m[\u001b[49m\u001b[33;43m'\u001b[39;49m\u001b[33;43muniprot_id\u001b[39;49m\u001b[33;43m'\u001b[39;49m\u001b[43m,\u001b[49m\u001b[33;43m'\u001b[39;49m\u001b[33;43mchembl_id\u001b[39;49m\u001b[33;43m'\u001b[39;49m\u001b[43m,\u001b[49m\u001b[33;43m'\u001b[39;49m\u001b[33;43mtarget_name\u001b[39;49m\u001b[33;43m'\u001b[39;49m\u001b[43m,\u001b[49m\u001b[33;43m'\u001b[39;49m\u001b[33;43mtarget_type\u001b[39;49m\u001b[33;43m'\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1026\u001b[39m, in \u001b[36mread_csv\u001b[39m\u001b[34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)\u001b[39m\n\u001b[32m   1013\u001b[39m kwds_defaults = _refine_defaults_read(\n\u001b[32m   1014\u001b[39m     dialect,\n\u001b[32m   1015\u001b[39m     delimiter,\n\u001b[32m   (...)\u001b[39m\u001b[32m   1022\u001b[39m     dtype_backend=dtype_backend,\n\u001b[32m   1023\u001b[39m )\n\u001b[32m   1024\u001b[39m kwds.update(kwds_defaults)\n\u001b[32m-> \u001b[39m\u001b[32m1026\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_read\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfilepath_or_buffer\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mkwds\u001b[49m\u001b[43m)\u001b[49m\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/site-packages/pandas/io/parsers/readers.py:620\u001b[39m, in \u001b[36m_read\u001b[39m\u001b[34m(filepath_or_buffer, kwds)\u001b[39m\n\u001b[32m    617\u001b[39m _validate_names(kwds.get(\u001b[33m\"\u001b[39m\u001b[33mnames\u001b[39m\u001b[33m\"\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m))\n\u001b[32m    619\u001b[39m \u001b[38;5;66;03m# Create the parser.\u001b[39;00m\n\u001b[32m--> \u001b[39m\u001b[32m620\u001b[39m parser = \u001b[43mTextFileReader\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfilepath_or_buffer\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwds\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m    622\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m chunksize \u001b[38;5;129;01mor\u001b[39;00m iterator:\n\u001b[32m    623\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m parser\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1620\u001b[39m, in \u001b[36mTextFileReader.__init__\u001b[39m\u001b[34m(self, f, engine, **kwds)\u001b[39m\n\u001b[32m   1617\u001b[39m     \u001b[38;5;28mself\u001b[39m.options[\u001b[33m\"\u001b[39m\u001b[33mhas_index_names\u001b[39m\u001b[33m\"\u001b[39m] = kwds[\u001b[33m\"\u001b[39m\u001b[33mhas_index_names\u001b[39m\u001b[33m\"\u001b[39m]\n\u001b[32m   1619\u001b[39m \u001b[38;5;28mself\u001b[39m.handles: IOHandles | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[32m-> \u001b[39m\u001b[32m1620\u001b[39m \u001b[38;5;28mself\u001b[39m._engine = \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_make_engine\u001b[49m\u001b[43m(\u001b[49m\u001b[43mf\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mengine\u001b[49m\u001b[43m)\u001b[49m\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1880\u001b[39m, in \u001b[36mTextFileReader._make_engine\u001b[39m\u001b[34m(self, f, engine)\u001b[39m\n\u001b[32m   1878\u001b[39m     \u001b[38;5;28;01mif\u001b[39;00m \u001b[33m\"\u001b[39m\u001b[33mb\u001b[39m\u001b[33m\"\u001b[39m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m mode:\n\u001b[32m   1879\u001b[39m         mode += \u001b[33m\"\u001b[39m\u001b[33mb\u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m-> \u001b[39m\u001b[32m1880\u001b[39m \u001b[38;5;28mself\u001b[39m.handles = \u001b[43mget_handle\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m   1881\u001b[39m \u001b[43m    \u001b[49m\u001b[43mf\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m   1882\u001b[39m \u001b[43m    \u001b[49m\u001b[43mmode\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m   1883\u001b[39m \u001b[43m    \u001b[49m\u001b[43mencoding\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43moptions\u001b[49m\u001b[43m.\u001b[49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mencoding\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m   1884\u001b[39m \u001b[43m    \u001b[49m\u001b[43mcompression\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43moptions\u001b[49m\u001b[43m.\u001b[49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mcompression\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m   1885\u001b[39m \u001b[43m    \u001b[49m\u001b[43mmemory_map\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43moptions\u001b[49m\u001b[43m.\u001b[49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mmemory_map\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m   1886\u001b[39m \u001b[43m    \u001b[49m\u001b[43mis_text\u001b[49m\u001b[43m=\u001b[49m\u001b[43mis_text\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m   1887\u001b[39m \u001b[43m    \u001b[49m\u001b[43merrors\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43moptions\u001b[49m\u001b[43m.\u001b[49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mencoding_errors\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mstrict\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m   1888\u001b[39m \u001b[43m    \u001b[49m\u001b[43mstorage_options\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43moptions\u001b[49m\u001b[43m.\u001b[49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mstorage_options\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m   1889\u001b[39m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m   1890\u001b[39m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28mself\u001b[39m.handles \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[32m   1891\u001b[39m f = \u001b[38;5;28mself\u001b[39m.handles.handle\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/site-packages/pandas/io/common.py:728\u001b[39m, in \u001b[36mget_handle\u001b[39m\u001b[34m(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)\u001b[39m\n\u001b[32m    725\u001b[39m     codecs.lookup_error(errors)\n\u001b[32m    727\u001b[39m \u001b[38;5;66;03m# open URLs\u001b[39;00m\n\u001b[32m--> \u001b[39m\u001b[32m728\u001b[39m ioargs = \u001b[43m_get_filepath_or_buffer\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m    729\u001b[39m \u001b[43m    \u001b[49m\u001b[43mpath_or_buf\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    730\u001b[39m \u001b[43m    \u001b[49m\u001b[43mencoding\u001b[49m\u001b[43m=\u001b[49m\u001b[43mencoding\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    731\u001b[39m \u001b[43m    \u001b[49m\u001b[43mcompression\u001b[49m\u001b[43m=\u001b[49m\u001b[43mcompression\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    732\u001b[39m \u001b[43m    \u001b[49m\u001b[43mmode\u001b[49m\u001b[43m=\u001b[49m\u001b[43mmode\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    733\u001b[39m \u001b[43m    \u001b[49m\u001b[43mstorage_options\u001b[49m\u001b[43m=\u001b[49m\u001b[43mstorage_options\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    734\u001b[39m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m    736\u001b[39m handle = ioargs.filepath_or_buffer\n\u001b[32m    737\u001b[39m handles: \u001b[38;5;28mlist\u001b[39m[BaseBuffer]\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/site-packages/pandas/io/common.py:384\u001b[39m, in \u001b[36m_get_filepath_or_buffer\u001b[39m\u001b[34m(filepath_or_buffer, encoding, compression, mode, storage_options)\u001b[39m\n\u001b[32m    382\u001b[39m \u001b[38;5;66;03m# assuming storage_options is to be interpreted as headers\u001b[39;00m\n\u001b[32m    383\u001b[39m req_info = urllib.request.Request(filepath_or_buffer, headers=storage_options)\n\u001b[32m--> \u001b[39m\u001b[32m384\u001b[39m \u001b[38;5;28;01mwith\u001b[39;00m \u001b[43murlopen\u001b[49m\u001b[43m(\u001b[49m\u001b[43mreq_info\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;28;01mas\u001b[39;00m req:\n\u001b[32m    385\u001b[39m     content_encoding = req.headers.get(\u001b[33m\"\u001b[39m\u001b[33mContent-Encoding\u001b[39m\u001b[33m\"\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m)\n\u001b[32m    386\u001b[39m     \u001b[38;5;28;01mif\u001b[39;00m content_encoding == \u001b[33m\"\u001b[39m\u001b[33mgzip\u001b[39m\u001b[33m\"\u001b[39m:\n\u001b[32m    387\u001b[39m         \u001b[38;5;66;03m# Override compression based on Content-Encoding header\u001b[39;00m\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/site-packages/pandas/io/common.py:289\u001b[39m, in \u001b[36murlopen\u001b[39m\u001b[34m(*args, **kwargs)\u001b[39m\n\u001b[32m    283\u001b[39m \u001b[38;5;250m\u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m    284\u001b[39m \u001b[33;03mLazy-import wrapper for stdlib urlopen, as that imports a big chunk of\u001b[39;00m\n\u001b[32m    285\u001b[39m \u001b[33;03mthe stdlib.\u001b[39;00m\n\u001b[32m    286\u001b[39m \u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m    287\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01murllib\u001b[39;00m\u001b[34;01m.\u001b[39;00m\u001b[34;01mrequest\u001b[39;00m\n\u001b[32m--> \u001b[39m\u001b[32m289\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43murllib\u001b[49m\u001b[43m.\u001b[49m\u001b[43mrequest\u001b[49m\u001b[43m.\u001b[49m\u001b[43murlopen\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/urllib/request.py:216\u001b[39m, in \u001b[36murlopen\u001b[39m\u001b[34m(url, data, timeout, cafile, capath, cadefault, context)\u001b[39m\n\u001b[32m    214\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m    215\u001b[39m     opener = _opener\n\u001b[32m--> \u001b[39m\u001b[32m216\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mopener\u001b[49m\u001b[43m.\u001b[49m\u001b[43mopen\u001b[49m\u001b[43m(\u001b[49m\u001b[43murl\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdata\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtimeout\u001b[49m\u001b[43m)\u001b[49m\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/urllib/request.py:519\u001b[39m, in \u001b[36mOpenerDirector.open\u001b[39m\u001b[34m(self, fullurl, data, timeout)\u001b[39m\n\u001b[32m    516\u001b[39m     req = meth(req)\n\u001b[32m    518\u001b[39m sys.audit(\u001b[33m'\u001b[39m\u001b[33murllib.Request\u001b[39m\u001b[33m'\u001b[39m, req.full_url, req.data, req.headers, req.get_method())\n\u001b[32m--> \u001b[39m\u001b[32m519\u001b[39m response = \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_open\u001b[49m\u001b[43m(\u001b[49m\u001b[43mreq\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdata\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m    521\u001b[39m \u001b[38;5;66;03m# post-process response\u001b[39;00m\n\u001b[32m    522\u001b[39m meth_name = protocol+\u001b[33m\"\u001b[39m\u001b[33m_response\u001b[39m\u001b[33m\"\u001b[39m\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/urllib/request.py:536\u001b[39m, in \u001b[36mOpenerDirector._open\u001b[39m\u001b[34m(self, req, data)\u001b[39m\n\u001b[32m    533\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m result\n\u001b[32m    535\u001b[39m protocol = req.type\n\u001b[32m--> \u001b[39m\u001b[32m536\u001b[39m result = \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_call_chain\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mhandle_open\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mprotocol\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mprotocol\u001b[49m\u001b[43m \u001b[49m\u001b[43m+\u001b[49m\n\u001b[32m    537\u001b[39m \u001b[43m                          \u001b[49m\u001b[33;43m'\u001b[39;49m\u001b[33;43m_open\u001b[39;49m\u001b[33;43m'\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mreq\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m    538\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m result:\n\u001b[32m    539\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m result\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/urllib/request.py:496\u001b[39m, in \u001b[36mOpenerDirector._call_chain\u001b[39m\u001b[34m(self, chain, kind, meth_name, *args)\u001b[39m\n\u001b[32m    494\u001b[39m \u001b[38;5;28;01mfor\u001b[39;00m handler \u001b[38;5;129;01min\u001b[39;00m handlers:\n\u001b[32m    495\u001b[39m     func = \u001b[38;5;28mgetattr\u001b[39m(handler, meth_name)\n\u001b[32m--> \u001b[39m\u001b[32m496\u001b[39m     result = \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m    497\u001b[39m     \u001b[38;5;28;01mif\u001b[39;00m result \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[32m    498\u001b[39m         \u001b[38;5;28;01mreturn\u001b[39;00m result\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/urllib/request.py:1391\u001b[39m, in \u001b[36mHTTPSHandler.https_open\u001b[39m\u001b[34m(self, req)\u001b[39m\n\u001b[32m   1390\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mhttps_open\u001b[39m(\u001b[38;5;28mself\u001b[39m, req):\n\u001b[32m-> \u001b[39m\u001b[32m1391\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mdo_open\u001b[49m\u001b[43m(\u001b[49m\u001b[43mhttp\u001b[49m\u001b[43m.\u001b[49m\u001b[43mclient\u001b[49m\u001b[43m.\u001b[49m\u001b[43mHTTPSConnection\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mreq\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m   1392\u001b[39m \u001b[43m        \u001b[49m\u001b[43mcontext\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_context\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcheck_hostname\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_check_hostname\u001b[49m\u001b[43m)\u001b[49m\n",
      "\u001b[36mFile \u001b[39m\u001b[32m/opt/homebrew/Caskroom/miniforge/base/envs/rdkit_2025_09/lib/python3.11/urllib/request.py:1351\u001b[39m, in \u001b[36mAbstractHTTPHandler.do_open\u001b[39m\u001b[34m(self, http_class, req, **http_conn_args)\u001b[39m\n\u001b[32m   1348\u001b[39m         h.request(req.get_method(), req.selector, req.data, headers,\n\u001b[32m   1349\u001b[39m                   encode_chunked=req.has_header(\u001b[33m'\u001b[39m\u001b[33mTransfer-encoding\u001b[39m\u001b[33m'\u001b[39m))\n\u001b[32m   1350\u001b[39m     \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mOSError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m err: \u001b[38;5;66;03m# timeout error\u001b[39;00m\n\u001b[32m-> \u001b[39m\u001b[32m1351\u001b[39m         \u001b[38;5;28;01mraise\u001b[39;00m URLError(err)\n\u001b[32m   1352\u001b[39m     r = h.getresponse()\n\u001b[32m   1353\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m:\n",
      "\u001b[31mURLError\u001b[39m: <urlopen error [Errno 8] nodename nor servname provided, or not known>"
     ]
    }
   ],
   "source": [
    "df = pd.read_csv(\"https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/chembl_uniprot_mapping.txt\",sep=\"\\t\",\n",
    "                comment='#',\n",
    "                names=['uniprot_id','chembl_id','target_name','target_type'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8355b8c3-a360-4980-ba3a-b8efc8b2fefd",
   "metadata": {},
   "source": [
    "Now we'll search the dataframe for that Uniprot id. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "0dad8db6-81b9-41a0-a19c-3dc5ceb150cf",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-07-05T14:14:31.052729Z",
     "start_time": "2025-07-05T14:14:31.038934Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>uniprot_id</th>\n",
       "      <th>chembl_id</th>\n",
       "      <th>target_name</th>\n",
       "      <th>target_type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2695</th>\n",
       "      <td>Q9UNQ0</td>\n",
       "      <td>CHEMBL5393</td>\n",
       "      <td>ATP-binding cassette sub-family G member 2</td>\n",
       "      <td>SINGLE PROTEIN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     uniprot_id   chembl_id                                 target_name  \\\n",
       "2695     Q9UNQ0  CHEMBL5393  ATP-binding cassette sub-family G member 2   \n",
       "\n",
       "         target_type  \n",
       "2695  SINGLE PROTEIN  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.query(\"uniprot_id  == 'Q9UNQ0' and target_type == 'SINGLE PROTEIN'\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b738bbb8-833d-450f-8524-e7e86aa1aa8f",
   "metadata": {},
   "source": [
    "### 3. Query the ChEMBL database for hERG data\n",
    "\n",
    "Create a connection to the ChEMBL database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "e6cd6942-cc9b-4bf8-8a33-53bc92a651b2",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-07-05T14:14:33.878049Z",
     "start_time": "2025-07-05T14:14:33.874468Z"
    }
   },
   "outputs": [],
   "source": [
    "conn = sqlite3.connect(db_filename)\n",
    "cursor = conn.cursor()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a471830-43ed-4e5a-9ead-65a274ef65be",
   "metadata": {},
   "source": [
    "Here's the sql query to grab all the hERG data and surrounding information from ChEMBL.  \n",
    "\n",
    "- **target_dictionary**: Filters targets by chembl_id (CHEMBL240).\n",
    "- **assays**: Links targets (tid) to assays (assay_id).\n",
    "- **docs**: Provides document details (doc_id, doi, title).\n",
    "- **activities**: Connects assays to compound records and provides standard_value and standard_units.\n",
    "- **compound_records**: Links compound records to molecular structures and provides compound_id.\n",
    "- **compound_structures**: Extracts canonical_smiles."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "5abaae3d-2eee-4bdc-aff3-654fae3f9161",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-07-05T14:14:42.297723Z",
     "start_time": "2025-07-05T14:14:42.295056Z"
    }
   },
   "outputs": [],
   "source": [
    "sql = \"\"\"SELECT\n",
    "    d.doc_id,\n",
    "    d.doi,\n",
    "    d.title,\n",
    "    a.assay_id,\n",
    "    cr.compound_key,\n",
    "    cs.molregno,\n",
    "    cs.canonical_smiles,\n",
    "    act.standard_type,\n",
    "    act.standard_value,\n",
    "    act.standard_relation,\n",
    "    act.standard_units,\n",
    "    act.pchembl_value\n",
    "FROM\n",
    "    target_dictionary td\n",
    "        JOIN\n",
    "    assays a ON td.tid = a.tid\n",
    "        JOIN\n",
    "    docs d ON a.doc_id = d.doc_id\n",
    "        JOIN\n",
    "    activities act ON a.assay_id = act.assay_id AND d.doc_id = act.doc_id\n",
    "        JOIN\n",
    "    compound_records cr ON act.record_id = cr.record_id\n",
    "        JOIN\n",
    "    compound_structures cs ON cr.molregno = cs.molregno\n",
    "WHERE\n",
    "    td.chembl_id = 'CHEMBL240';\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e1fa5f7-fa4a-4fab-88a3-9cba35781a23",
   "metadata": {},
   "source": [
    "Run the query"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "2a3b6eb9-ae1f-41a9-aec3-aff90606a8a7",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-07-05T14:14:47.624120Z",
     "start_time": "2025-07-05T14:14:47.269473Z"
    }
   },
   "outputs": [],
   "source": [
    "res = cursor.execute(sql)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75a3b26c-361c-454a-aa97-c3a97970983e",
   "metadata": {},
   "source": [
    "Load the results into a Pandas dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "3ce409a7-fa4d-4a07-94ec-299df3c63614",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-07-05T14:14:53.621359Z",
     "start_time": "2025-07-05T14:14:50.361791Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "30467"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_sql_query(sql,conn)\n",
    "len(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "9adb479c-ceac-4092-9b0a-b71746b6b4d5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>doc_id</th>\n",
       "      <th>doi</th>\n",
       "      <th>title</th>\n",
       "      <th>assay_id</th>\n",
       "      <th>compound_key</th>\n",
       "      <th>molregno</th>\n",
       "      <th>canonical_smiles</th>\n",
       "      <th>standard_type</th>\n",
       "      <th>standard_value</th>\n",
       "      <th>standard_relation</th>\n",
       "      <th>standard_units</th>\n",
       "      <th>pchembl_value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>4905</td>\n",
       "      <td>10.1016/s0960-894x(02)00250-0</td>\n",
       "      <td>4,4-Disubstituted cyclohexylamine NK(1) recept...</td>\n",
       "      <td>220871</td>\n",
       "      <td>5f</td>\n",
       "      <td>75221</td>\n",
       "      <td>CC(C(=O)N[C@]1(c2ccccc2)CC[C@@H](N2CCN(C(C)C)C...</td>\n",
       "      <td>Ki</td>\n",
       "      <td>1200.0</td>\n",
       "      <td>=</td>\n",
       "      <td>nM</td>\n",
       "      <td>5.92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4905</td>\n",
       "      <td>10.1016/s0960-894x(02)00250-0</td>\n",
       "      <td>4,4-Disubstituted cyclohexylamine NK(1) recept...</td>\n",
       "      <td>220871</td>\n",
       "      <td>5i</td>\n",
       "      <td>1595020</td>\n",
       "      <td>CC(C(=O)NC1(c2ccccc2)CCC(N2CCC3(CCCO3)CC2)CC1)...</td>\n",
       "      <td>Ki</td>\n",
       "      <td>730.0</td>\n",
       "      <td>=</td>\n",
       "      <td>nM</td>\n",
       "      <td>6.14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4905</td>\n",
       "      <td>10.1016/s0960-894x(02)00250-0</td>\n",
       "      <td>4,4-Disubstituted cyclohexylamine NK(1) recept...</td>\n",
       "      <td>220871</td>\n",
       "      <td>2</td>\n",
       "      <td>75085</td>\n",
       "      <td>CC(C(=O)N[C@]1(c2ccccc2)CC[C@@H](N2CCC(c3ccc(F...</td>\n",
       "      <td>Ki</td>\n",
       "      <td>43.0</td>\n",
       "      <td>=</td>\n",
       "      <td>nM</td>\n",
       "      <td>7.37</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4905</td>\n",
       "      <td>10.1016/s0960-894x(02)00250-0</td>\n",
       "      <td>4,4-Disubstituted cyclohexylamine NK(1) recept...</td>\n",
       "      <td>220871</td>\n",
       "      <td>5b</td>\n",
       "      <td>1595015</td>\n",
       "      <td>CC(C(=O)N[C@]1(c2ccccc2)CC[C@H](N2CCCCC2)CC1)c...</td>\n",
       "      <td>Ki</td>\n",
       "      <td>1200.0</td>\n",
       "      <td>=</td>\n",
       "      <td>nM</td>\n",
       "      <td>5.92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4905</td>\n",
       "      <td>10.1016/s0960-894x(02)00250-0</td>\n",
       "      <td>4,4-Disubstituted cyclohexylamine NK(1) recept...</td>\n",
       "      <td>220871</td>\n",
       "      <td>5c</td>\n",
       "      <td>1595019</td>\n",
       "      <td>CC1CCN([C@H]2CC[C@](NC(=O)C(C)c3cc(C(F)(F)F)cc...</td>\n",
       "      <td>Ki</td>\n",
       "      <td>460.0</td>\n",
       "      <td>=</td>\n",
       "      <td>nM</td>\n",
       "      <td>6.34</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   doc_id                            doi  \\\n",
       "0    4905  10.1016/s0960-894x(02)00250-0   \n",
       "1    4905  10.1016/s0960-894x(02)00250-0   \n",
       "2    4905  10.1016/s0960-894x(02)00250-0   \n",
       "3    4905  10.1016/s0960-894x(02)00250-0   \n",
       "4    4905  10.1016/s0960-894x(02)00250-0   \n",
       "\n",
       "                                               title  assay_id compound_key  \\\n",
       "0  4,4-Disubstituted cyclohexylamine NK(1) recept...    220871           5f   \n",
       "1  4,4-Disubstituted cyclohexylamine NK(1) recept...    220871           5i   \n",
       "2  4,4-Disubstituted cyclohexylamine NK(1) recept...    220871            2   \n",
       "3  4,4-Disubstituted cyclohexylamine NK(1) recept...    220871           5b   \n",
       "4  4,4-Disubstituted cyclohexylamine NK(1) recept...    220871           5c   \n",
       "\n",
       "   molregno                                   canonical_smiles standard_type  \\\n",
       "0     75221  CC(C(=O)N[C@]1(c2ccccc2)CC[C@@H](N2CCN(C(C)C)C...            Ki   \n",
       "1   1595020  CC(C(=O)NC1(c2ccccc2)CCC(N2CCC3(CCCO3)CC2)CC1)...            Ki   \n",
       "2     75085  CC(C(=O)N[C@]1(c2ccccc2)CC[C@@H](N2CCC(c3ccc(F...            Ki   \n",
       "3   1595015  CC(C(=O)N[C@]1(c2ccccc2)CC[C@H](N2CCCCC2)CC1)c...            Ki   \n",
       "4   1595019  CC1CCN([C@H]2CC[C@](NC(=O)C(C)c3cc(C(F)(F)F)cc...            Ki   \n",
       "\n",
       "   standard_value standard_relation standard_units  pchembl_value  \n",
       "0          1200.0                 =             nM           5.92  \n",
       "1           730.0                 =             nM           6.14  \n",
       "2            43.0                 =             nM           7.37  \n",
       "3          1200.0                 =             nM           5.92  \n",
       "4           460.0                 =             nM           6.34  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d659716e-b3c6-4f79-ba28-d471cf592cf9",
   "metadata": {},
   "source": [
    "Only keep compounds with measured values that don't have an operator (>,<) and have standard units in nM (skip entries with %)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "ba7f8eae-8875-40b9-99f3-f8b8b15455ae",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "11830"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_ok = df.dropna(subset=\"pchembl_value\").query(\"standard_relation == '='\").query(\"standard_units == 'nM'\").copy()\n",
    "len(df_ok)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b55d6b2a-9dca-4c78-98c4-77554d69318c",
   "metadata": {},
   "source": [
    "### 4. Extract congeneric series from papers "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12681e6d-1fba-4bf8-9f16-c44df38f10ad",
   "metadata": {},
   "source": [
    "The function below processes the structures from a paper and identifies congeneric series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "43f2b31e-a4c5-46c4-9597-4bd4977cee88",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_series_from_paper(input_df):\n",
    "    res = []\n",
    "    for k,v in input_df.groupby([\"assay_id\",\"standard_type\",\"standard_units\"]):\n",
    "        paper_df = v.copy()\n",
    "        paper_df['mol'] = paper_df.canonical_smiles.apply(Chem.MolFromSmiles)\n",
    "        activity_df = paper_df[['doc_id','doi','canonical_smiles','molregno','compound_key','standard_value','assay_id','standard_type','standard_units','pchembl_value']]\n",
    "        mol_df, scaffold_df = uru.find_scaffolds(paper_df,smiles_col=\"canonical_smiles\",name_col=\"molregno\", disable_progress=True)\n",
    "        scaffold_smi, out_df = uru.get_molecules_with_scaffold(scaffold_df.Scaffold.values[0], mol_df, activity_df,\n",
    "                                  smiles_col=\"canonical_smiles\", \n",
    "                                  name_col=\"molregno\",\n",
    "                                  activity_col=\"standard_value\",\n",
    "                                  extra_cols=[\"compound_key\",\"assay_id\",\"standard_type\",\"standard_units\",\"doc_id\",\"pchembl_value\"])\n",
    "        out_df.merge(paper_df[[\"canonical_smiles\",\"molregno\",\"compound_key\"]],on=[\"canonical_smiles\",\"molregno\"])\n",
    "        if len(out_df) > 1:\n",
    "            res.append([scaffold_smi,out_df])\n",
    "    return res"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d503733-a5d0-4392-aec2-e8742c38a99d",
   "metadata": {},
   "source": [
    "As an example, consider [ChEMBL doc_id `117736`](https://www.ebi.ac.uk/chembl/explore/document/CHEMBL4680090).  We'll do a quick search to see the form the results take. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "26d43e61-cb3e-40fe-9510-bc0d7ed65fa2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[array(['CC(C)n1cc(C(=O)c2cncc(NC(=O)C[*:3])c2)c2c([*:1])nc([*:2])nc21'],\n",
       "        dtype=object),\n",
       "                                      canonical_smiles  molregno  \\\n",
       "  0  CC(C)n1cc(C(=O)c2cncc(NC(=O)Cc3ccc(Cl)cc3)c2)c...   2526613   \n",
       "  1  CC(C)n1cc(C(=O)c2cncc(NC(=O)Cc3ccc(Cl)cc3)c2)c...   1994374   \n",
       "  2  CC(C)n1cc(C(=O)c2cncc(NC(=O)Cc3ccc(C#N)cc3)c2)...   1994376   \n",
       "  3  CC(C)n1cc(C(=O)c2cncc(NC(=O)Cc3ccc(C#N)cc3)c2)...   2531494   \n",
       "  4  CC(C)n1cc(C(=O)c2cncc(NC(=O)Cn3ccc(C4CC4)n3)c2...   1994395   \n",
       "  5  CC(C)n1cc(C(=O)c2cncc(NC(=O)Cn3ccc(C4CC4)n3)c2...   1999216   \n",
       "  6  CC(C)n1cc(C(=O)c2cncc(NC(=O)Cc3ccc(Cl)cc3)c2)c...   1999213   \n",
       "  \n",
       "     standard_value compound_key  assay_id standard_type standard_units  doc_id  \\\n",
       "  0         10800.0            3   2027231            Ki             nM  117736   \n",
       "  1          5200.0            4   2027231            Ki             nM  117736   \n",
       "  2          7100.0            5   2027231            Ki             nM  117736   \n",
       "  3          9700.0            6   2027231            Ki             nM  117736   \n",
       "  4         11900.0            8   2027231            Ki             nM  117736   \n",
       "  5         38900.0           11   2027231            Ki             nM  117736   \n",
       "  6         16900.0           12   2027231            Ki             nM  117736   \n",
       "  \n",
       "     pchembl_value  \n",
       "  0           4.97  \n",
       "  1           5.28  \n",
       "  2           5.15  \n",
       "  3           5.01  \n",
       "  4           4.92  \n",
       "  5           4.41  \n",
       "  6           4.77  ]]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_df = df_ok.query(\"doc_id == 117736\")\n",
    "test_res = get_series_from_paper(test_df)\n",
    "test_res"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec3efef2-6fa9-48a3-af9f-8ad9e7159d52",
   "metadata": {},
   "source": [
    "Loop over each paper in `df_ok` and extract the congeneric series. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "0abad57d-8b34-4242-9c08-aab9089e25fd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f652608f7d944bab9a880d04c18d038c",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/1576 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df_list = []\n",
    "for k,v in tqdm(df_ok.groupby(\"doc_id\")):\n",
    "    if len(v) > 2:\n",
    "        series_res = get_series_from_paper(v)\n",
    "        for scaffold_smi, paper_df in series_res:\n",
    "            if len(scaffold_smi):\n",
    "                paper_df['doi'] = v.doi.values[0]\n",
    "                paper_df['scaffold'] = scaffold_smi[0]\n",
    "                paper_df.drop_duplicates([\"canonical_smiles\",\"molregno\"],inplace=True)\n",
    "                df_list.append(paper_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "56ba08a0-7263-4425-b291-de34c0dda88b",
   "metadata": {},
   "source": [
    "Save the results to a csv file for future reference. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "id": "caea829e-d41e-4809-ad0f-b18ed15eeb19",
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.concat(df_list).to_csv(\"chembl_hERG_summary.csv\",index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33f4aa5a-7c36-4d16-af19-a5b2c155aae9",
   "metadata": {},
   "source": [
    "Take a look at one example. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "921f9ea5-3b3c-4a2f-b3da-851d95bc4cde",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:rdkit=\"http://www.rdkit.org/xml\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" version=\"1.1\" baseProfile=\"full\" xml:space=\"preserve\" width=\"1000px\" height=\"200px\" viewBox=\"0 0 1000 200\">\n",
       "<!-- END OF HEADER -->\n",
       "<rect style=\"opacity:1.0;fill:#FFFFFF;stroke:none\" width=\"200.0\" height=\"200.0\" x=\"0.0\" y=\"0.0\"> </rect>\n",
       "<path class=\"bond-0 atom-0 atom-1\" d=\"M 10.0,134.0 L 25.1,133.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-1 atom-1 atom-2\" d=\"M 25.1,133.0 L 31.8,119.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-2 atom-2 atom-3\" d=\"M 31.8,119.3 L 38.3,118.9\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-2 atom-2 atom-3\" d=\"M 38.3,118.9 L 44.8,118.4\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-3 atom-3 atom-4\" d=\"M 48.3,115.6 L 51.0,110.1\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-3 atom-3 atom-4\" d=\"M 51.0,110.1 L 53.7,104.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-4 atom-4 atom-5\" d=\"M 53.7,104.6 L 68.8,103.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-5 atom-5 atom-6\" d=\"M 68.8,103.6 L 72.1,108.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-5 atom-5 atom-6\" d=\"M 72.1,108.6 L 75.5,113.5\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-6 atom-5 atom-7\" d=\"M 68.8,103.6 L 75.5,90.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-7 atom-7 atom-8\" d=\"M 75.5,90.0 L 81.8,89.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-7 atom-7 atom-8\" d=\"M 81.8,89.5 L 88.2,89.1\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-8 atom-8 atom-9\" d=\"M 91.9,86.2 L 94.6,80.8\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-8 atom-8 atom-9\" d=\"M 94.6,80.8 L 97.3,75.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-9 atom-9 atom-10\" d=\"M 97.3,75.3 L 88.8,62.7\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-9 atom-9 atom-10\" d=\"M 98.5,72.9 L 91.5,62.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-10 atom-10 atom-11\" d=\"M 88.8,62.7 L 95.5,49.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-11 atom-11 atom-12\" d=\"M 95.5,49.0 L 110.7,48.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-11 atom-11 atom-12\" d=\"M 97.0,51.2 L 109.5,50.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-12 atom-12 atom-13\" d=\"M 110.7,48.0 L 119.1,60.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-13 atom-13 atom-14\" d=\"M 119.1,60.6 L 112.5,74.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-13 atom-13 atom-14\" d=\"M 116.5,60.8 L 111.0,72.1\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-14 atom-14 atom-15\" d=\"M 112.5,74.2 L 120.9,86.8\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-15 atom-15 atom-16\" d=\"M 120.2,85.8 L 117.4,91.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-15 atom-15 atom-16\" d=\"M 117.4,91.5 L 114.5,97.3\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-15 atom-15 atom-16\" d=\"M 122.2,86.8 L 119.4,92.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-15 atom-15 atom-16\" d=\"M 119.4,92.5 L 116.6,98.3\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-16 atom-15 atom-17\" d=\"M 120.9,86.8 L 136.1,85.8\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-17 atom-17 atom-18\" d=\"M 136.1,85.8 L 144.6,98.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-18 atom-18 atom-19\" d=\"M 144.6,98.4 L 159.7,97.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-19 atom-19 atom-20\" d=\"M 159.7,97.4 L 168.2,110.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-19 atom-19 atom-20\" d=\"M 162.3,97.2 L 169.3,107.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-20 atom-20 atom-21\" d=\"M 168.2,110.0 L 183.3,108.9\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-21 atom-21 atom-22\" d=\"M 183.3,108.9 L 190.0,95.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-21 atom-21 atom-22\" d=\"M 181.9,106.8 L 187.4,95.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-22 atom-22 atom-23\" d=\"M 190.0,95.3 L 181.5,82.7\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-23 atom-23 atom-24\" d=\"M 181.5,82.7 L 166.4,83.7\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-23 atom-23 atom-24\" d=\"M 180.4,85.1 L 167.8,85.9\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-24 atom-14 atom-9\" d=\"M 112.5,74.2 L 97.3,75.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-25 atom-24 atom-19\" d=\"M 166.4,83.7 L 159.7,97.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 24.4,133.0 L 25.1,133.0 L 25.5,132.3\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 31.5,120.0 L 31.8,119.3 L 32.2,119.3\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 53.5,104.9 L 53.7,104.6 L 54.4,104.6\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 75.1,90.6 L 75.5,90.0 L 75.8,89.9\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 89.3,63.3 L 88.8,62.7 L 89.2,62.0\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 95.2,49.7 L 95.5,49.0 L 96.3,49.0\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 109.9,48.1 L 110.7,48.0 L 111.1,48.6\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 118.7,60.0 L 119.1,60.6 L 118.8,61.3\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 120.5,86.2 L 120.9,86.8 L 121.7,86.8\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 135.3,85.9 L 136.1,85.8 L 136.5,86.4\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 144.1,97.8 L 144.6,98.4 L 145.3,98.4\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 167.8,109.3 L 168.2,110.0 L 168.9,109.9\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 182.6,109.0 L 183.3,108.9 L 183.7,108.3\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 189.7,96.0 L 190.0,95.3 L 189.6,94.7\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 182.0,83.3 L 181.5,82.7 L 180.8,82.7\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 167.1,83.7 L 166.4,83.7 L 166.0,84.4\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path class=\"atom-3\" d=\"M 46.0 116.1 L 47.4 118.4 Q 47.6 118.6, 47.8 119.0 Q 48.0 119.4, 48.0 119.5 L 48.0 116.1 L 48.6 116.1 L 48.6 120.4 L 48.0 120.4 L 46.5 117.9 Q 46.3 117.6, 46.1 117.3 Q 46.0 117.0, 45.9 116.9 L 45.9 120.4 L 45.3 120.4 L 45.3 116.1 L 46.0 116.1 \" fill=\"#0000FF\"/>\n",
       "<path class=\"atom-3\" d=\"M 49.4 116.1 L 50.0 116.1 L 50.0 118.0 L 52.2 118.0 L 52.2 116.1 L 52.8 116.1 L 52.8 120.4 L 52.2 120.4 L 52.2 118.4 L 50.0 118.4 L 50.0 120.4 L 49.4 120.4 L 49.4 116.1 \" fill=\"#0000FF\"/>\n",
       "<path class=\"atom-6\" d=\"M 75.3 116.2 Q 75.3 115.2, 75.8 114.6 Q 76.3 114.0, 77.3 114.0 Q 78.2 114.0, 78.7 114.6 Q 79.2 115.2, 79.2 116.2 Q 79.2 117.3, 78.7 117.9 Q 78.2 118.4, 77.3 118.4 Q 76.3 118.4, 75.8 117.9 Q 75.3 117.3, 75.3 116.2 M 77.3 118.0 Q 77.9 118.0, 78.3 117.5 Q 78.6 117.1, 78.6 116.2 Q 78.6 115.4, 78.3 114.9 Q 77.9 114.5, 77.3 114.5 Q 76.6 114.5, 76.3 114.9 Q 75.9 115.4, 75.9 116.2 Q 75.9 117.1, 76.3 117.5 Q 76.6 118.0, 77.3 118.0 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-6\" d=\"M 79.9 114.1 L 80.5 114.1 L 80.5 115.9 L 82.7 115.9 L 82.7 114.1 L 83.3 114.1 L 83.3 118.4 L 82.7 118.4 L 82.7 116.4 L 80.5 116.4 L 80.5 118.4 L 79.9 118.4 L 79.9 114.1 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-8\" d=\"M 88.7 88.9 Q 88.7 87.9, 89.2 87.3 Q 89.7 86.7, 90.6 86.7 Q 91.6 86.7, 92.1 87.3 Q 92.6 87.9, 92.6 88.9 Q 92.6 90.0, 92.1 90.6 Q 91.6 91.2, 90.6 91.2 Q 89.7 91.2, 89.2 90.6 Q 88.7 90.0, 88.7 88.9 M 90.6 90.7 Q 91.3 90.7, 91.6 90.2 Q 92.0 89.8, 92.0 88.9 Q 92.0 88.1, 91.6 87.7 Q 91.3 87.2, 90.6 87.2 Q 90.0 87.2, 89.6 87.7 Q 89.3 88.1, 89.3 88.9 Q 89.3 89.8, 89.6 90.2 Q 90.0 90.7, 90.6 90.7 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-16\" d=\"M 112.3 100.5 Q 112.3 99.5, 112.8 98.9 Q 113.3 98.3, 114.3 98.3 Q 115.2 98.3, 115.7 98.9 Q 116.2 99.5, 116.2 100.5 Q 116.2 101.5, 115.7 102.1 Q 115.2 102.7, 114.3 102.7 Q 113.3 102.7, 112.8 102.1 Q 112.3 101.6, 112.3 100.5 M 114.3 102.2 Q 114.9 102.2, 115.3 101.8 Q 115.6 101.4, 115.6 100.5 Q 115.6 99.7, 115.3 99.2 Q 114.9 98.8, 114.3 98.8 Q 113.6 98.8, 113.2 99.2 Q 112.9 99.6, 112.9 100.5 Q 112.9 101.4, 113.2 101.8 Q 113.6 102.2, 114.3 102.2 \" fill=\"#FF0000\"/>\n",
       "<path class=\"legend\" d=\"M 91.6 183.8 Q 92.7 184.3, 93.3 185.0 Q 93.9 185.6, 93.9 186.7 Q 93.9 187.7, 93.4 188.4 Q 92.9 189.2, 92.0 189.6 Q 91.1 190.0, 89.9 190.0 Q 88.0 190.0, 87.0 189.1 Q 85.9 188.3, 85.9 186.7 Q 85.9 185.8, 86.3 185.1 Q 86.8 184.4, 87.8 183.9 Q 87.1 183.5, 86.7 182.9 Q 86.3 182.2, 86.3 181.3 Q 86.3 180.0, 87.2 179.2 Q 88.2 178.4, 89.9 178.4 Q 91.5 178.4, 92.5 179.2 Q 93.5 180.0, 93.5 181.3 Q 93.5 182.1, 93.0 182.7 Q 92.5 183.3, 91.6 183.8 M 89.9 179.6 Q 88.9 179.6, 88.4 180.0 Q 87.9 180.5, 87.9 181.3 Q 87.9 181.9, 88.2 182.3 Q 88.6 182.7, 89.1 182.9 Q 89.6 183.1, 90.6 183.4 Q 91.3 183.0, 91.5 182.4 Q 91.9 181.9, 91.9 181.3 Q 91.9 180.5, 91.3 180.0 Q 90.8 179.6, 89.9 179.6 M 89.9 188.8 Q 91.0 188.8, 91.6 188.2 Q 92.3 187.7, 92.3 186.7 Q 92.3 186.1, 91.9 185.7 Q 91.6 185.3, 91.1 185.1 Q 90.6 184.9, 89.7 184.6 L 89.0 184.4 Q 88.2 184.8, 87.8 185.4 Q 87.5 186.0, 87.5 186.7 Q 87.5 187.7, 88.1 188.2 Q 88.8 188.8, 89.9 188.8 \" fill=\"#000000\"/>\n",
       "<path class=\"legend\" d=\"M 99.8 190.0 Q 97.8 190.0, 96.7 188.4 Q 95.7 186.9, 95.7 184.2 Q 95.7 181.4, 96.7 179.9 Q 97.7 178.4, 99.8 178.4 Q 101.9 178.4, 103.0 179.9 Q 104.0 181.4, 104.0 184.2 Q 104.0 186.9, 102.9 188.4 Q 101.9 190.0, 99.8 190.0 M 99.8 188.7 Q 101.1 188.7, 101.7 187.6 Q 102.4 186.4, 102.4 184.2 Q 102.4 181.9, 101.7 180.8 Q 101.1 179.7, 99.8 179.7 Q 98.6 179.7, 98.0 180.8 Q 97.3 181.9, 97.3 184.2 Q 97.3 186.4, 98.0 187.6 Q 98.6 188.7, 99.8 188.7 \" fill=\"#000000\"/>\n",
       "<path class=\"legend\" d=\"M 110.0 190.0 Q 107.9 190.0, 106.9 188.4 Q 105.8 186.9, 105.8 184.2 Q 105.8 181.4, 106.9 179.9 Q 107.9 178.4, 110.0 178.4 Q 112.1 178.4, 113.1 179.9 Q 114.1 181.4, 114.1 184.2 Q 114.1 186.9, 113.1 188.4 Q 112.1 190.0, 110.0 190.0 M 110.0 188.7 Q 111.2 188.7, 111.9 187.6 Q 112.5 186.4, 112.5 184.2 Q 112.5 181.9, 111.9 180.8 Q 111.2 179.7, 110.0 179.7 Q 108.8 179.7, 108.1 180.8 Q 107.4 181.9, 107.4 184.2 Q 107.4 186.4, 108.1 187.6 Q 108.8 188.7, 110.0 188.7 \" fill=\"#000000\"/>\n",
       "<path class=\"bond-0 atom-0 atom-1\" d=\"M 210.0,141.9 L 225.1,140.9\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-1 atom-1 atom-2\" d=\"M 225.1,140.9 L 231.8,127.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-2 atom-2 atom-3\" d=\"M 231.8,127.2 L 238.3,126.8\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-2 atom-2 atom-3\" d=\"M 238.3,126.8 L 244.8,126.3\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-3 atom-3 atom-4\" d=\"M 248.3,123.5 L 251.0,118.0\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-3 atom-3 atom-4\" d=\"M 251.0,118.0 L 253.7,112.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-4 atom-4 atom-5\" d=\"M 253.7,112.5 L 268.8,111.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-5 atom-5 atom-6\" d=\"M 268.8,111.5 L 272.1,116.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-5 atom-5 atom-6\" d=\"M 272.1,116.5 L 275.5,121.4\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-6 atom-5 atom-7\" d=\"M 268.8,111.5 L 275.5,97.9\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-7 atom-7 atom-8\" d=\"M 275.5,97.9 L 281.8,97.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-7 atom-7 atom-8\" d=\"M 281.8,97.4 L 288.2,97.0\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-8 atom-8 atom-9\" d=\"M 291.9,94.2 L 294.6,88.7\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-8 atom-8 atom-9\" d=\"M 294.6,88.7 L 297.3,83.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-9 atom-9 atom-10\" d=\"M 297.3,83.2 L 288.8,70.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-9 atom-9 atom-10\" d=\"M 298.5,80.8 L 291.5,70.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-10 atom-10 atom-11\" d=\"M 288.8,70.6 L 295.5,57.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-11 atom-11 atom-12\" d=\"M 295.5,57.0 L 310.7,55.9\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-11 atom-11 atom-12\" d=\"M 297.0,59.1 L 309.5,58.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-12 atom-12 atom-13\" d=\"M 310.7,55.9 L 313.3,50.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-12 atom-12 atom-13\" d=\"M 313.3,50.5 L 316.0,45.0\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-13 atom-12 atom-14\" d=\"M 310.7,55.9 L 319.1,68.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-14 atom-14 atom-15\" d=\"M 319.1,68.5 L 312.5,82.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-14 atom-14 atom-15\" d=\"M 316.5,68.7 L 311.0,80.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-15 atom-15 atom-16\" d=\"M 312.5,82.2 L 320.9,94.8\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-16 atom-16 atom-17\" d=\"M 320.2,93.7 L 317.4,99.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-16 atom-16 atom-17\" d=\"M 317.4,99.4 L 314.5,105.2\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-16 atom-16 atom-17\" d=\"M 322.2,94.7 L 319.4,100.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-16 atom-16 atom-17\" d=\"M 319.4,100.4 L 316.6,106.2\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-17 atom-16 atom-18\" d=\"M 320.9,94.8 L 336.1,93.7\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-18 atom-18 atom-19\" d=\"M 336.1,93.7 L 344.6,106.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-19 atom-19 atom-20\" d=\"M 344.6,106.3 L 359.7,105.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-20 atom-20 atom-21\" d=\"M 359.7,105.3 L 368.2,117.9\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-20 atom-20 atom-21\" d=\"M 362.3,105.1 L 369.3,115.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-21 atom-21 atom-22\" d=\"M 368.2,117.9 L 383.3,116.8\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-22 atom-22 atom-23\" d=\"M 383.3,116.8 L 390.0,103.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-22 atom-22 atom-23\" d=\"M 381.9,114.7 L 387.4,103.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-23 atom-23 atom-24\" d=\"M 390.0,103.2 L 381.5,90.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-24 atom-24 atom-25\" d=\"M 381.5,90.6 L 366.4,91.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-24 atom-24 atom-25\" d=\"M 380.4,93.0 L 367.8,93.8\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-25 atom-15 atom-9\" d=\"M 312.5,82.2 L 297.3,83.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-26 atom-25 atom-20\" d=\"M 366.4,91.6 L 359.7,105.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 224.4,140.9 L 225.1,140.9 L 225.5,140.2\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 231.5,127.9 L 231.8,127.2 L 232.2,127.2\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 253.5,112.8 L 253.7,112.5 L 254.4,112.5\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 275.1,98.6 L 275.5,97.9 L 275.8,97.8\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 289.3,71.2 L 288.8,70.6 L 289.2,69.9\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 295.2,57.6 L 295.5,57.0 L 296.3,56.9\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 318.7,67.9 L 319.1,68.5 L 318.8,69.2\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 320.5,94.1 L 320.9,94.8 L 321.7,94.7\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 335.3,93.8 L 336.1,93.7 L 336.5,94.3\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 344.1,105.7 L 344.6,106.3 L 345.3,106.3\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 367.8,117.3 L 368.2,117.9 L 368.9,117.8\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 382.6,116.9 L 383.3,116.8 L 383.7,116.2\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 389.7,103.9 L 390.0,103.2 L 389.6,102.6\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 382.0,91.2 L 381.5,90.6 L 380.8,90.7\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 367.1,91.6 L 366.4,91.6 L 366.0,92.3\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path class=\"atom-3\" d=\"M 246.0 124.0 L 247.4 126.3 Q 247.6 126.5, 247.8 126.9 Q 248.0 127.4, 248.0 127.4 L 248.0 124.0 L 248.6 124.0 L 248.6 128.3 L 248.0 128.3 L 246.5 125.8 Q 246.3 125.6, 246.1 125.2 Q 246.0 124.9, 245.9 124.8 L 245.9 128.3 L 245.3 128.3 L 245.3 124.0 L 246.0 124.0 \" fill=\"#0000FF\"/>\n",
       "<path class=\"atom-3\" d=\"M 249.4 124.0 L 250.0 124.0 L 250.0 125.9 L 252.2 125.9 L 252.2 124.0 L 252.8 124.0 L 252.8 128.3 L 252.2 128.3 L 252.2 126.3 L 250.0 126.3 L 250.0 128.3 L 249.4 128.3 L 249.4 124.0 \" fill=\"#0000FF\"/>\n",
       "<path class=\"atom-6\" d=\"M 275.3 124.1 Q 275.3 123.1, 275.8 122.5 Q 276.3 121.9, 277.3 121.9 Q 278.2 121.9, 278.7 122.5 Q 279.2 123.1, 279.2 124.1 Q 279.2 125.2, 278.7 125.8 Q 278.2 126.4, 277.3 126.4 Q 276.3 126.4, 275.8 125.8 Q 275.3 125.2, 275.3 124.1 M 277.3 125.9 Q 277.9 125.9, 278.3 125.4 Q 278.6 125.0, 278.6 124.1 Q 278.6 123.3, 278.3 122.9 Q 277.9 122.4, 277.3 122.4 Q 276.6 122.4, 276.3 122.8 Q 275.9 123.3, 275.9 124.1 Q 275.9 125.0, 276.3 125.4 Q 276.6 125.9, 277.3 125.9 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-6\" d=\"M 279.9 122.0 L 280.5 122.0 L 280.5 123.8 L 282.7 123.8 L 282.7 122.0 L 283.3 122.0 L 283.3 126.3 L 282.7 126.3 L 282.7 124.3 L 280.5 124.3 L 280.5 126.3 L 279.9 126.3 L 279.9 122.0 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-8\" d=\"M 288.7 96.8 Q 288.7 95.8, 289.2 95.2 Q 289.7 94.7, 290.6 94.7 Q 291.6 94.7, 292.1 95.2 Q 292.6 95.8, 292.6 96.8 Q 292.6 97.9, 292.1 98.5 Q 291.6 99.1, 290.6 99.1 Q 289.7 99.1, 289.2 98.5 Q 288.7 97.9, 288.7 96.8 M 290.6 98.6 Q 291.3 98.6, 291.6 98.1 Q 292.0 97.7, 292.0 96.8 Q 292.0 96.0, 291.6 95.6 Q 291.3 95.1, 290.6 95.1 Q 290.0 95.1, 289.6 95.6 Q 289.3 96.0, 289.3 96.8 Q 289.3 97.7, 289.6 98.1 Q 290.0 98.6, 290.6 98.6 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-13\" d=\"M 315.4 42.3 Q 315.4 41.3, 315.9 40.7 Q 316.4 40.1, 317.3 40.1 Q 318.3 40.1, 318.8 40.7 Q 319.3 41.3, 319.3 42.3 Q 319.3 43.3, 318.8 43.9 Q 318.3 44.5, 317.3 44.5 Q 316.4 44.5, 315.9 43.9 Q 315.4 43.3, 315.4 42.3 M 317.3 44.0 Q 318.0 44.0, 318.3 43.6 Q 318.7 43.1, 318.7 42.3 Q 318.7 41.4, 318.3 41.0 Q 318.0 40.6, 317.3 40.6 Q 316.7 40.6, 316.3 41.0 Q 316.0 41.4, 316.0 42.3 Q 316.0 43.2, 316.3 43.6 Q 316.7 44.0, 317.3 44.0 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-13\" d=\"M 320.0 40.1 L 320.6 40.1 L 320.6 42.0 L 322.8 42.0 L 322.8 40.1 L 323.3 40.1 L 323.3 44.4 L 322.8 44.4 L 322.8 42.5 L 320.6 42.5 L 320.6 44.4 L 320.0 44.4 L 320.0 40.1 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-17\" d=\"M 312.3 108.4 Q 312.3 107.4, 312.8 106.8 Q 313.3 106.2, 314.3 106.2 Q 315.2 106.2, 315.7 106.8 Q 316.2 107.4, 316.2 108.4 Q 316.2 109.5, 315.7 110.0 Q 315.2 110.6, 314.3 110.6 Q 313.3 110.6, 312.8 110.0 Q 312.3 109.5, 312.3 108.4 M 314.3 110.2 Q 314.9 110.2, 315.3 109.7 Q 315.6 109.3, 315.6 108.4 Q 315.6 107.6, 315.3 107.1 Q 314.9 106.7, 314.3 106.7 Q 313.6 106.7, 313.2 107.1 Q 312.9 107.6, 312.9 108.4 Q 312.9 109.3, 313.2 109.7 Q 313.6 110.2, 314.3 110.2 \" fill=\"#FF0000\"/>\n",
       "<path class=\"legend\" d=\"M 282.8 186.0 L 284.2 186.0 L 284.2 187.3 L 282.8 187.3 L 282.8 189.9 L 281.3 189.9 L 281.3 187.3 L 275.4 187.3 L 275.4 186.3 L 280.4 178.5 L 282.8 178.5 L 282.8 186.0 M 277.3 186.0 L 281.3 186.0 L 281.3 179.6 L 277.3 186.0 \" fill=\"#000000\"/>\n",
       "<path class=\"legend\" d=\"M 289.9 190.0 Q 287.8 190.0, 286.8 188.5 Q 285.8 186.9, 285.8 184.2 Q 285.8 181.5, 286.8 179.9 Q 287.8 178.4, 289.9 178.4 Q 292.0 178.4, 293.0 179.9 Q 294.1 181.5, 294.1 184.2 Q 294.1 186.9, 293.0 188.5 Q 292.0 190.0, 289.9 190.0 M 289.9 188.7 Q 291.2 188.7, 291.8 187.6 Q 292.5 186.4, 292.5 184.2 Q 292.5 182.0, 291.8 180.8 Q 291.2 179.7, 289.9 179.7 Q 288.7 179.7, 288.0 180.8 Q 287.4 182.0, 287.4 184.2 Q 287.4 186.4, 288.0 187.6 Q 288.7 188.7, 289.9 188.7 \" fill=\"#000000\"/>\n",
       "<path class=\"legend\" d=\"M 300.1 190.0 Q 298.0 190.0, 297.0 188.5 Q 295.9 186.9, 295.9 184.2 Q 295.9 181.5, 297.0 179.9 Q 298.0 178.4, 300.1 178.4 Q 302.2 178.4, 303.2 179.9 Q 304.2 181.5, 304.2 184.2 Q 304.2 186.9, 303.2 188.5 Q 302.2 190.0, 300.1 190.0 M 300.1 188.7 Q 301.3 188.7, 302.0 187.6 Q 302.6 186.4, 302.6 184.2 Q 302.6 182.0, 302.0 180.8 Q 301.3 179.7, 300.1 179.7 Q 298.9 179.7, 298.2 180.8 Q 297.5 182.0, 297.5 184.2 Q 297.5 186.4, 298.2 187.6 Q 298.9 188.7, 300.1 188.7 \" fill=\"#000000\"/>\n",
       "<path class=\"legend\" d=\"M 310.3 190.0 Q 308.2 190.0, 307.1 188.5 Q 306.1 186.9, 306.1 184.2 Q 306.1 181.5, 307.1 179.9 Q 308.2 178.4, 310.3 178.4 Q 312.3 178.4, 313.4 179.9 Q 314.4 181.5, 314.4 184.2 Q 314.4 186.9, 313.4 188.5 Q 312.3 190.0, 310.3 190.0 M 310.3 188.7 Q 311.5 188.7, 312.1 187.6 Q 312.8 186.4, 312.8 184.2 Q 312.8 182.0, 312.1 180.8 Q 311.5 179.7, 310.3 179.7 Q 309.0 179.7, 308.4 180.8 Q 307.7 182.0, 307.7 184.2 Q 307.7 186.4, 308.4 187.6 Q 309.0 188.7, 310.3 188.7 \" fill=\"#000000\"/>\n",
       "<path class=\"legend\" d=\"M 320.4 190.0 Q 318.3 190.0, 317.3 188.5 Q 316.3 186.9, 316.3 184.2 Q 316.3 181.5, 317.3 179.9 Q 318.3 178.4, 320.4 178.4 Q 322.5 178.4, 323.5 179.9 Q 324.6 181.5, 324.6 184.2 Q 324.6 186.9, 323.5 188.5 Q 322.5 190.0, 320.4 190.0 M 320.4 188.7 Q 321.7 188.7, 322.3 187.6 Q 323.0 186.4, 323.0 184.2 Q 323.0 182.0, 322.3 180.8 Q 321.7 179.7, 320.4 179.7 Q 319.2 179.7, 318.5 180.8 Q 317.9 182.0, 317.9 184.2 Q 317.9 186.4, 318.5 187.6 Q 319.2 188.7, 320.4 188.7 \" fill=\"#000000\"/>\n",
       "<path class=\"bond-0 atom-0 atom-1\" d=\"M 433.9,121.3 L 436.6,115.8\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-0 atom-0 atom-1\" d=\"M 436.6,115.8 L 439.3,110.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-1 atom-1 atom-2\" d=\"M 439.3,110.3 L 454.5,109.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-2 atom-2 atom-3\" d=\"M 454.5,109.3 L 457.8,114.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-2 atom-2 atom-3\" d=\"M 457.8,114.3 L 461.1,119.2\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-3 atom-2 atom-4\" d=\"M 454.5,109.3 L 461.1,95.7\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-4 atom-4 atom-5\" d=\"M 461.1,95.7 L 467.5,95.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-4 atom-4 atom-5\" d=\"M 467.5,95.2 L 473.8,94.8\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-5 atom-5 atom-6\" d=\"M 477.6,92.0 L 480.3,86.5\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-5 atom-5 atom-6\" d=\"M 480.3,86.5 L 483.0,81.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-6 atom-6 atom-7\" d=\"M 483.0,81.0 L 474.5,68.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-6 atom-6 atom-7\" d=\"M 484.1,78.6 L 477.1,68.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-7 atom-7 atom-8\" d=\"M 474.5,68.4 L 481.2,54.8\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-8 atom-8 atom-9\" d=\"M 481.2,54.8 L 496.3,53.7\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-8 atom-8 atom-9\" d=\"M 482.6,56.9 L 495.2,56.1\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-9 atom-9 atom-10\" d=\"M 496.3,53.7 L 504.8,66.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-10 atom-10 atom-11\" d=\"M 504.8,66.3 L 498.1,80.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-10 atom-10 atom-11\" d=\"M 502.2,66.5 L 496.6,77.8\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-11 atom-11 atom-12\" d=\"M 498.1,80.0 L 506.6,92.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-12 atom-12 atom-13\" d=\"M 505.9,91.5 L 503.0,97.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-12 atom-12 atom-13\" d=\"M 503.0,97.2 L 500.2,103.0\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-12 atom-12 atom-13\" d=\"M 507.9,92.5 L 505.1,98.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-12 atom-12 atom-13\" d=\"M 505.1,98.2 L 502.2,104.0\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-13 atom-12 atom-14\" d=\"M 506.6,92.6 L 521.7,91.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-14 atom-14 atom-15\" d=\"M 521.7,91.5 L 530.2,104.1\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-15 atom-15 atom-16\" d=\"M 530.2,104.1 L 545.4,103.1\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-16 atom-16 atom-17\" d=\"M 545.4,103.1 L 553.8,115.7\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-16 atom-16 atom-17\" d=\"M 548.0,102.9 L 555.0,113.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-17 atom-17 atom-18\" d=\"M 553.8,115.7 L 569.0,114.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-18 atom-18 atom-19\" d=\"M 569.0,114.6 L 575.7,101.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-18 atom-18 atom-19\" d=\"M 567.5,112.5 L 573.0,101.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-19 atom-19 atom-20\" d=\"M 575.7,101.0 L 567.2,88.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-20 atom-20 atom-21\" d=\"M 567.2,88.4 L 552.0,89.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-20 atom-20 atom-21\" d=\"M 566.0,90.8 L 553.5,91.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-21 atom-11 atom-6\" d=\"M 498.1,80.0 L 483.0,81.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-22 atom-21 atom-16\" d=\"M 552.0,89.4 L 545.4,103.1\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 439.2,110.6 L 439.3,110.3 L 440.1,110.3\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 460.8,96.4 L 461.1,95.7 L 461.5,95.6\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 474.9,69.0 L 474.5,68.4 L 474.8,67.7\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 480.8,55.4 L 481.2,54.8 L 481.9,54.7\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 495.6,53.8 L 496.3,53.7 L 496.7,54.3\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 504.4,65.7 L 504.8,66.3 L 504.5,67.0\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 506.2,91.9 L 506.6,92.6 L 507.3,92.5\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 521.0,91.6 L 521.7,91.5 L 522.2,92.1\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 529.8,103.5 L 530.2,104.1 L 531.0,104.1\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 553.4,115.1 L 553.8,115.7 L 554.6,115.6\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 568.2,114.7 L 569.0,114.6 L 569.3,114.0\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 575.3,101.7 L 575.7,101.0 L 575.2,100.4\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 567.6,89.0 L 567.2,88.4 L 566.4,88.5\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 552.8,89.4 L 552.0,89.4 L 551.7,90.1\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path class=\"atom-0\" d=\"M 424.3 121.8 L 424.9 121.8 L 424.9 123.7 L 427.1 123.7 L 427.1 121.8 L 427.7 121.8 L 427.7 126.1 L 427.1 126.1 L 427.1 124.1 L 424.9 124.1 L 424.9 126.1 L 424.3 126.1 L 424.3 121.8 \" fill=\"#0000FF\"/>\n",
       "<path class=\"atom-0\" d=\"M 428.5 126.0 Q 428.6 125.7, 428.9 125.6 Q 429.1 125.4, 429.5 125.4 Q 429.9 125.4, 430.2 125.6 Q 430.4 125.9, 430.4 126.3 Q 430.4 126.7, 430.1 127.1 Q 429.8 127.5, 429.1 128.0 L 430.4 128.0 L 430.4 128.3 L 428.5 128.3 L 428.5 128.0 Q 429.1 127.6, 429.4 127.4 Q 429.7 127.1, 429.8 126.8 Q 430.0 126.6, 430.0 126.3 Q 430.0 126.0, 429.9 125.9 Q 429.7 125.7, 429.5 125.7 Q 429.3 125.7, 429.1 125.8 Q 429.0 125.9, 428.8 126.1 L 428.5 126.0 \" fill=\"#0000FF\"/>\n",
       "<path class=\"atom-0\" d=\"M 431.7 121.8 L 433.1 124.1 Q 433.2 124.3, 433.5 124.7 Q 433.7 125.2, 433.7 125.2 L 433.7 121.8 L 434.3 121.8 L 434.3 126.1 L 433.7 126.1 L 432.2 123.6 Q 432.0 123.4, 431.8 123.0 Q 431.6 122.7, 431.6 122.6 L 431.6 126.1 L 431.0 126.1 L 431.0 121.8 L 431.7 121.8 \" fill=\"#0000FF\"/>\n",
       "<path class=\"atom-3\" d=\"M 461.0 121.9 Q 461.0 120.9, 461.5 120.3 Q 462.0 119.7, 462.9 119.7 Q 463.9 119.7, 464.4 120.3 Q 464.9 120.9, 464.9 121.9 Q 464.9 123.0, 464.4 123.6 Q 463.9 124.2, 462.9 124.2 Q 462.0 124.2, 461.5 123.6 Q 461.0 123.0, 461.0 121.9 M 462.9 123.7 Q 463.6 123.7, 463.9 123.2 Q 464.3 122.8, 464.3 121.9 Q 464.3 121.1, 463.9 120.7 Q 463.6 120.2, 462.9 120.2 Q 462.3 120.2, 461.9 120.6 Q 461.6 121.1, 461.6 121.9 Q 461.6 122.8, 461.9 123.2 Q 462.3 123.7, 462.9 123.7 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-3\" d=\"M 465.6 119.8 L 466.2 119.8 L 466.2 121.6 L 468.4 121.6 L 468.4 119.8 L 468.9 119.8 L 468.9 124.1 L 468.4 124.1 L 468.4 122.1 L 466.2 122.1 L 466.2 124.1 L 465.6 124.1 L 465.6 119.8 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-5\" d=\"M 474.3 94.6 Q 474.3 93.6, 474.8 93.0 Q 475.3 92.5, 476.3 92.5 Q 477.2 92.5, 477.8 93.0 Q 478.3 93.6, 478.3 94.6 Q 478.3 95.7, 477.7 96.3 Q 477.2 96.9, 476.3 96.9 Q 475.3 96.9, 474.8 96.3 Q 474.3 95.7, 474.3 94.6 M 476.3 96.4 Q 476.9 96.4, 477.3 95.9 Q 477.7 95.5, 477.7 94.6 Q 477.7 93.8, 477.3 93.4 Q 476.9 92.9, 476.3 92.9 Q 475.6 92.9, 475.3 93.4 Q 474.9 93.8, 474.9 94.6 Q 474.9 95.5, 475.3 95.9 Q 475.6 96.4, 476.3 96.4 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-13\" d=\"M 497.9 106.2 Q 497.9 105.2, 498.4 104.6 Q 499.0 104.0, 499.9 104.0 Q 500.9 104.0, 501.4 104.6 Q 501.9 105.2, 501.9 106.2 Q 501.9 107.3, 501.4 107.8 Q 500.9 108.4, 499.9 108.4 Q 499.0 108.4, 498.4 107.8 Q 497.9 107.3, 497.9 106.2 M 499.9 108.0 Q 500.6 108.0, 500.9 107.5 Q 501.3 107.1, 501.3 106.2 Q 501.3 105.4, 500.9 104.9 Q 500.6 104.5, 499.9 104.5 Q 499.3 104.5, 498.9 104.9 Q 498.5 105.4, 498.5 106.2 Q 498.5 107.1, 498.9 107.5 Q 499.3 108.0, 499.9 108.0 \" fill=\"#FF0000\"/>\n",
       "<path class=\"legend\" d=\"M 482.7 186.0 L 484.1 186.0 L 484.1 187.3 L 482.7 187.3 L 482.7 189.9 L 481.2 189.9 L 481.2 187.3 L 475.3 187.3 L 475.3 186.3 L 480.3 178.5 L 482.7 178.5 L 482.7 186.0 M 477.2 186.0 L 481.2 186.0 L 481.2 179.6 L 477.2 186.0 \" fill=\"#000000\"/>\n",
       "<path class=\"legend\" d=\"M 493.0 186.0 L 494.4 186.0 L 494.4 187.3 L 493.0 187.3 L 493.0 189.9 L 491.5 189.9 L 491.5 187.3 L 485.7 187.3 L 485.7 186.3 L 490.6 178.5 L 493.0 178.5 L 493.0 186.0 M 487.5 186.0 L 491.5 186.0 L 491.5 179.6 L 487.5 186.0 \" fill=\"#000000\"/>\n",
       "<path class=\"legend\" d=\"M 500.2 190.0 Q 498.1 190.0, 497.1 188.5 Q 496.0 186.9, 496.0 184.2 Q 496.0 181.5, 497.1 179.9 Q 498.1 178.4, 500.2 178.4 Q 502.3 178.4, 503.3 179.9 Q 504.3 181.5, 504.3 184.2 Q 504.3 186.9, 503.3 188.5 Q 502.3 190.0, 500.2 190.0 M 500.2 188.7 Q 501.4 188.7, 502.1 187.6 Q 502.7 186.4, 502.7 184.2 Q 502.7 182.0, 502.1 180.8 Q 501.4 179.7, 500.2 179.7 Q 499.0 179.7, 498.3 180.8 Q 497.6 182.0, 497.6 184.2 Q 497.6 186.4, 498.3 187.6 Q 499.0 188.7, 500.2 188.7 \" fill=\"#000000\"/>\n",
       "<path class=\"legend\" d=\"M 510.4 190.0 Q 508.3 190.0, 507.2 188.5 Q 506.2 186.9, 506.2 184.2 Q 506.2 181.5, 507.2 179.9 Q 508.3 178.4, 510.4 178.4 Q 512.4 178.4, 513.5 179.9 Q 514.5 181.5, 514.5 184.2 Q 514.5 186.9, 513.5 188.5 Q 512.4 190.0, 510.4 190.0 M 510.4 188.7 Q 511.6 188.7, 512.2 187.6 Q 512.9 186.4, 512.9 184.2 Q 512.9 182.0, 512.2 180.8 Q 511.6 179.7, 510.4 179.7 Q 509.1 179.7, 508.5 180.8 Q 507.8 182.0, 507.8 184.2 Q 507.8 186.4, 508.5 187.6 Q 509.1 188.7, 510.4 188.7 \" fill=\"#000000\"/>\n",
       "<path class=\"legend\" d=\"M 520.5 190.0 Q 518.4 190.0, 517.4 188.5 Q 516.4 186.9, 516.4 184.2 Q 516.4 181.5, 517.4 179.9 Q 518.4 178.4, 520.5 178.4 Q 522.6 178.4, 523.6 179.9 Q 524.7 181.5, 524.7 184.2 Q 524.7 186.9, 523.6 188.5 Q 522.6 190.0, 520.5 190.0 M 520.5 188.7 Q 521.8 188.7, 522.4 187.6 Q 523.1 186.4, 523.1 184.2 Q 523.1 182.0, 522.4 180.8 Q 521.8 179.7, 520.5 179.7 Q 519.3 179.7, 518.6 180.8 Q 518.0 182.0, 518.0 184.2 Q 518.0 186.4, 518.6 187.6 Q 519.3 188.7, 520.5 188.7 \" fill=\"#000000\"/>\n",
       "</svg>"
      ],
      "text/plain": [
       "<IPython.core.display.SVG object>"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tmp_df = df_list[12].copy()\n",
    "tmp_df.sort_values(\"standard_value\",inplace=True)\n",
    "legends = [f\"{x:.0f}\" for x in tmp_df.standard_value]\n",
    "Chem.Draw.MolsToGridImage(uru.align_mols_to_template(tmp_df.scaffold.values[0],tmp_df.canonical_smiles),molsPerRow=5,useSVG=True,legends=legends)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd846efb-f6ca-4fce-a254-039ca44fbaea",
   "metadata": {},
   "source": [
    "Create a dataframe containing the number of structrues from each paper."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "1e012f60-59ab-4810-a70c-21b57181586d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>doi</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10.1016/j.bmcl.2011.07.006</td>\n",
       "      <td>56</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10.1016/j.bmcl.2015.04.002</td>\n",
       "      <td>56</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>10.1016/j.bmcl.2015.06.061</td>\n",
       "      <td>52</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>10.1016/j.bmc.2016.02.031</td>\n",
       "      <td>51</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>10.1016/j.bmcl.2010.04.045</td>\n",
       "      <td>41</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>741</th>\n",
       "      <td>10.1021/acs.jmedchem.1c01015</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>742</th>\n",
       "      <td>10.1016/j.ejmech.2021.113674</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>743</th>\n",
       "      <td>10.1021/acs.jmedchem.5b01372</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>744</th>\n",
       "      <td>10.1021/jm2013248</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>745</th>\n",
       "      <td>10.1021/acsmedchemlett.7b00501</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>746 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                doi  count\n",
       "0        10.1016/j.bmcl.2011.07.006     56\n",
       "1        10.1016/j.bmcl.2015.04.002     56\n",
       "2        10.1016/j.bmcl.2015.06.061     52\n",
       "3         10.1016/j.bmc.2016.02.031     51\n",
       "4        10.1016/j.bmcl.2010.04.045     41\n",
       "..                              ...    ...\n",
       "741    10.1021/acs.jmedchem.1c01015      2\n",
       "742    10.1016/j.ejmech.2021.113674      2\n",
       "743    10.1021/acs.jmedchem.5b01372      2\n",
       "744               10.1021/jm2013248      2\n",
       "745  10.1021/acsmedchemlett.7b00501      2\n",
       "\n",
       "[746 rows x 2 columns]"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "combo_df = pd.concat(df_list)\n",
    "uru.value_counts_df(combo_df,\"doi\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e4a1c767-719d-4948-9948-c8cc911d8f98",
   "metadata": {},
   "source": [
    "### 5. Determine which compounds can be purchased\n",
    "To do this, we will use the [MolBloom](https://github.com/whitead/molbloom) tool created by Andrew White. MolBloom generates a searchable index that allows quick determination of whether a compound is in a database. MolBloom has a simple function `buy`, which returns **True** or **False** indicating whether a compound is commercially available. In this case, we'll use MolBloom to verify that each compound is in the [ZINC](https://cartblanche22.docking.org/) database of commercially available compounds. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "e63cd018-cb45-4a3a-9e2e-3c166a00dc02",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "717bd336c8e84117adc608be9580e350",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/5838 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "combo_df['purchasable'] = [buy(smi, canonicalize=True) for smi in tqdm(combo_df.canonical_smiles)]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d175ead-f77f-4bd2-ae86-f8289360bd0f",
   "metadata": {},
   "source": [
    "Iterate over the compound sets we generated above and only select the ones that are purchasable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "9b527ff6-97e2-4538-9ec9-c5599dd62961",
   "metadata": {},
   "outputs": [],
   "source": [
    "result_list = []\n",
    "for k,v in combo_df.query(\"purchasable > 0\").groupby([\"doc_id\",\"scaffold\"]):\n",
    "    doc_id,scaffold = k\n",
    "    if len(v) > 1:\n",
    "        min_val = v.standard_value.min()\n",
    "        max_val = v.standard_value.max()\n",
    "        range_val = abs(np.log10(min_val) - np.log10(max_val))\n",
    "        result_list.append([doc_id,scaffold,len(v),min_val,max_val,range_val])\n",
    "final_df = pd.DataFrame(result_list,columns=[\"doc_id\",\"scaffold\",\"num\",\"min_val\",\"max_val\",\"range\"]).query(\"range >= 1\").sort_values(\"range\",ascending=False).round(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd84595f-83b8-44a2-8104-d8dffffe614d",
   "metadata": {},
   "source": [
    "### 6. Remove duplicate scaffolds\n",
    "\n",
    "A few of the papers we found have compounds sharing the same scaffold. To simplify our analysis, we'll only keep one paper for each scaffold. First, we'll sort the papers by the number of compounds with that specific scaffold. Then, we can use the Pandas drop_duplicates function to remove duplicate scaffolds. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "a0d49646-f746-4aa5-b744-4ea01cc11463",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>doc_id</th>\n",
       "      <th>scaffold</th>\n",
       "      <th>num</th>\n",
       "      <th>min_val</th>\n",
       "      <th>max_val</th>\n",
       "      <th>range</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>44897</td>\n",
       "      <td>CN(C(c1cc([*:4])c([*:3])cc1Oc1ccc([*:1])c([*:2...</td>\n",
       "      <td>11</td>\n",
       "      <td>645.0</td>\n",
       "      <td>8290.0</td>\n",
       "      <td>1.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>126186</td>\n",
       "      <td>c1ccc2c(c1)Sc1ccc([*:1])cc1N2CC(C[*:3])[*:2]</td>\n",
       "      <td>10</td>\n",
       "      <td>539.6</td>\n",
       "      <td>10999.9</td>\n",
       "      <td>1.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>51887</td>\n",
       "      <td>O=C(NC(=S)Nc1cc([*:2])cc([*:3])c1[*:4])c1ccc([...</td>\n",
       "      <td>5</td>\n",
       "      <td>10.0</td>\n",
       "      <td>4720.0</td>\n",
       "      <td>2.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5706</td>\n",
       "      <td>Fc1ccc(Cn2c(NC3CCN([*:1])CC3)nc3ccccc32)cc1</td>\n",
       "      <td>3</td>\n",
       "      <td>1.0</td>\n",
       "      <td>28.2</td>\n",
       "      <td>1.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>42094</td>\n",
       "      <td>Fc1ccc(Cn2c(NC3CCN([*:1])CC3)nc3ccccc32)cc1</td>\n",
       "      <td>3</td>\n",
       "      <td>1.0</td>\n",
       "      <td>28.2</td>\n",
       "      <td>1.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5535</td>\n",
       "      <td>c1cc([*:3])ccc1-n1cc(C2CCN([*:2])CC2)c2cc([*:1...</td>\n",
       "      <td>2</td>\n",
       "      <td>3.0</td>\n",
       "      <td>204.0</td>\n",
       "      <td>1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>20904</td>\n",
       "      <td>Fc1ccc(Cn2c(NC3CCN([*:1])CC3)nc3ccccc32)cc1</td>\n",
       "      <td>2</td>\n",
       "      <td>0.9</td>\n",
       "      <td>28.2</td>\n",
       "      <td>1.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>46532</td>\n",
       "      <td>Fc1ccc(Cn2c(NC3CCN([*:1])CC3)nc3ccccc32)cc1</td>\n",
       "      <td>2</td>\n",
       "      <td>1.0</td>\n",
       "      <td>27.5</td>\n",
       "      <td>1.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>55596</td>\n",
       "      <td>c1cc([*:3])ccc1-n1cc(C2CCN([*:2])CC2)c2cc([*:1...</td>\n",
       "      <td>2</td>\n",
       "      <td>14.8</td>\n",
       "      <td>204.2</td>\n",
       "      <td>1.1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    doc_id                                           scaffold  num  min_val  \\\n",
       "5    44897  CN(C(c1cc([*:4])c([*:3])cc1Oc1ccc([*:1])c([*:2...   11    645.0   \n",
       "35  126186       c1ccc2c(c1)Sc1ccc([*:1])cc1N2CC(C[*:3])[*:2]   10    539.6   \n",
       "13   51887  O=C(NC(=S)Nc1cc([*:2])cc([*:3])c1[*:4])c1ccc([...    5     10.0   \n",
       "1     5706        Fc1ccc(Cn2c(NC3CCN([*:1])CC3)nc3ccccc32)cc1    3      1.0   \n",
       "3    42094        Fc1ccc(Cn2c(NC3CCN([*:1])CC3)nc3ccccc32)cc1    3      1.0   \n",
       "0     5535  c1cc([*:3])ccc1-n1cc(C2CCN([*:2])CC2)c2cc([*:1...    2      3.0   \n",
       "2    20904        Fc1ccc(Cn2c(NC3CCN([*:1])CC3)nc3ccccc32)cc1    2      0.9   \n",
       "8    46532        Fc1ccc(Cn2c(NC3CCN([*:1])CC3)nc3ccccc32)cc1    2      1.0   \n",
       "14   55596  c1cc([*:3])ccc1-n1cc(C2CCN([*:2])CC2)c2cc([*:1...    2     14.8   \n",
       "\n",
       "    max_val  range  \n",
       "5    8290.0    1.1  \n",
       "35  10999.9    1.3  \n",
       "13   4720.0    2.7  \n",
       "1      28.2    1.4  \n",
       "3      28.2    1.4  \n",
       "0     204.0    1.8  \n",
       "2      28.2    1.5  \n",
       "8      27.5    1.4  \n",
       "14    204.2    1.1  "
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "final_df.sort_values(\"num\",ascending=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "f2af2db6-8b7a-4e65-9fbc-37f60bcda0fb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>doc_id</th>\n",
       "      <th>scaffold</th>\n",
       "      <th>num</th>\n",
       "      <th>min_val</th>\n",
       "      <th>max_val</th>\n",
       "      <th>range</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>44897</td>\n",
       "      <td>CN(C(c1cc([*:4])c([*:3])cc1Oc1ccc([*:1])c([*:2...</td>\n",
       "      <td>11</td>\n",
       "      <td>645.0</td>\n",
       "      <td>8290.0</td>\n",
       "      <td>1.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>126186</td>\n",
       "      <td>c1ccc2c(c1)Sc1ccc([*:1])cc1N2CC(C[*:3])[*:2]</td>\n",
       "      <td>10</td>\n",
       "      <td>539.6</td>\n",
       "      <td>10999.9</td>\n",
       "      <td>1.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>51887</td>\n",
       "      <td>O=C(NC(=S)Nc1cc([*:2])cc([*:3])c1[*:4])c1ccc([...</td>\n",
       "      <td>5</td>\n",
       "      <td>10.0</td>\n",
       "      <td>4720.0</td>\n",
       "      <td>2.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5706</td>\n",
       "      <td>Fc1ccc(Cn2c(NC3CCN([*:1])CC3)nc3ccccc32)cc1</td>\n",
       "      <td>3</td>\n",
       "      <td>1.0</td>\n",
       "      <td>28.2</td>\n",
       "      <td>1.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5535</td>\n",
       "      <td>c1cc([*:3])ccc1-n1cc(C2CCN([*:2])CC2)c2cc([*:1...</td>\n",
       "      <td>2</td>\n",
       "      <td>3.0</td>\n",
       "      <td>204.0</td>\n",
       "      <td>1.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    doc_id                                           scaffold  num  min_val  \\\n",
       "5    44897  CN(C(c1cc([*:4])c([*:3])cc1Oc1ccc([*:1])c([*:2...   11    645.0   \n",
       "35  126186       c1ccc2c(c1)Sc1ccc([*:1])cc1N2CC(C[*:3])[*:2]   10    539.6   \n",
       "13   51887  O=C(NC(=S)Nc1cc([*:2])cc([*:3])c1[*:4])c1ccc([...    5     10.0   \n",
       "1     5706        Fc1ccc(Cn2c(NC3CCN([*:1])CC3)nc3ccccc32)cc1    3      1.0   \n",
       "0     5535  c1cc([*:3])ccc1-n1cc(C2CCN([*:2])CC2)c2cc([*:1...    2      3.0   \n",
       "\n",
       "    max_val  range  \n",
       "5    8290.0    1.1  \n",
       "35  10999.9    1.3  \n",
       "13   4720.0    2.7  \n",
       "1      28.2    1.4  \n",
       "0     204.0    1.8  "
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "out_df = final_df.sort_values(\"num\",ascending=False).drop_duplicates(\"scaffold\")\n",
    "out_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2723d9a2-34bf-4eda-9c30-12da7064b8b5",
   "metadata": {},
   "source": [
    "### 7. Visualize the selected series\n",
    "Finally, we can use the [ipywidgets](https://ipywidgets.readthedocs.io/en/stable/) [interact](https://ipywidgets.readthedocs.io/en/stable/examples/Using%20Interact.html) tool and [mols2grid](https://github.com/cbouy/mols2grid) to build a component that allows us to interactively view the selected series.  First we will extract the details for the final set of examples from `combo_df`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "6836fb0e-5539-4228-a8e0-b44443f8515c",
   "metadata": {},
   "outputs": [],
   "source": [
    "detail_list = []\n",
    "for doc_id,scaffold in out_df[[\"doc_id\",\"scaffold\"]].values:\n",
    "    detail_list.append(combo_df.query(\"doc_id == @doc_id and scaffold == @scaffold and purchasable\"))\n",
    "detail_df = pd.concat(detail_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0a32c32-40c6-428b-a767-711b4ee0a189",
   "metadata": {},
   "source": [
    "Next, we can use the menu below to select the series based on the ChEMBL doc_id. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "id": "57301ee6-3aa6-4f99-88e9-cd32e50ecf12",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "502b4ddcc4d6479eb24731c16a4082c0",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "interactive(children=(Dropdown(description='doc_id', options=(44897, 126186, 51887, 5706, 5535), value=44897),…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "@interact(doc_id=out_df.doc_id.unique())\n",
    "def display_rgroups(doc_id):\n",
    "    return mols2grid.display(combo_df.query(\"doc_id == @doc_id and purchasable > 0\"),smiles_col=\"canonical_smiles\",size=(150,150),\n",
    "                      subset=[\"img\",\"standard_value\"],sort_by=\"standard_value\",selection=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44f0921c-cc3f-4eb1-9733-8adb1ddc809d",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
